SECURITY  CLASSIFICATION  OF  THIS  PASS  (MmSm  blm4 


. *  REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1.  REPORT  NUMICB  * 

85  % 

_ : _ Jt _ ^ _ : _ A 

55! 

2.  RECIPIENT'S  CATALOG  NUMBER 

*r 

«.  TITLE  (mid  SuM«J»J  i,  ’  .  “  ww 

Binomial  N  estimation:  A  Bayes  empirical 

Bayes  approach 

S.  TYPE  OF  REPORT  *  PCRIOO  COVCREO 

TR  12/86  -  6/88 

*.  PERFORMING  ORG.  REPORT  NUMBER 

7.  AUTHORf.J 

Adrian  E.  Raftery 

S.  CONTRACT  OR  GRANT  NUMBCRf.J 

N00014-84-C-0169 

S.  FCRFORMINO  ORGANIZATION  NAME  ANO  AOORCSS 

Department  of  Statistics,  GN-22 

University  of  Washington 

Seattle,  WA  98195 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 

AREA  *  WORK  UNIT  NUMBERS 

NR-661-003 

II.  CONTROLLING  OFFICE  NAME  AND  AOORCSS 

ONR  Code  N63374 

1107  NE  45th  Street 

Seattle,  WA  98105  — 

12.  REPORT  OATE 

July  1986 

IS.  number  of  pages 

12 

14.  MONITORINC  AGENCY  NAME  ft  AOONE %S(tl  dilloront  from  Controlling  Of/I eo) 

IS.  SECURITY  CLASS,  (a!  thl a  import) 

Unclassified 

IS..  OECLASSIFICATION/ DOWNGRADING 
SCHEDULE 

IS.  OISTRiauTION  STATEMENT  (at  dll.  Kaport) 

APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED. 

17.  DISTRIBUTION  STATEMENT  (•/  M.  NiFnI  mcnl  in  IIm6  10,  II  OUlmrmtt  trmm  RmpmrtJ 

IS.  supplementary  notes 

- 

IS.  KEY  WORDS  (Cmnilnum  an  nnn<  >M,  U  aNHWr  and  Identity  6 y  U«t  a 

Binomial  N  estimation;  Bayes  empirical  Bayes 

mmOm) 

20.  A»S1  RACY  (Continue  mi  nmn  ,14,  II  nmmmmmmrr  and  Idrnitltf  Or  *1.  c*  M*w| 

A  Bayes  empirical  Bayes  approach  to  the  problem  of  estimating  N  in  the 
binomial  distribution  is  presented.  This  provides  a  simple  and  flexible  way 
of  specifying  prior  Information,  and  also  allows  a  convenient  representation 
of  vague  prior  knowledge.  In  addition,  it  yields  a  solution  to  the  Interval 
estimation  problem.  The  Bayes  estimator  corresponding  to  the  relative 
squared  error  loss  function  and  a  vague  prior  distribution  is  shown  to  be 
stable,  and  to  compare  favorably  with  the  estimators  introduced/CONTINUED  ... 

DO  , 


FORM 
JAN  71 


1473  COITION  OF  I  NOV  St  IS  OSSOLCTK 
S/N  0102-  LF.0I4.6401 


SCCURITT  CLASSIFICATION  OF  THIS  PAOC  6mm  Im«N) 


by  Olkin  at  al. , (1981)  and  Carroll  and  Lombard  (1985) . 


Binomial  N  Estimation:  A  Bayes  Empirical  Bayes  Approach 
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ABSTRACT 

A  Bayes  empirical  Bayes  approach  to  the  problem  of  estimating  N  in  the 
binomial  distribution  is  presented.  This  provides  a  simple  and  flexible  way  of 
specifying  prior  information,  and  also  allows  a  convenient  representation  of 
vague  prior  knowledge,  hi  addition,  it  yields  a  solution  to  die  interval  estimation 
problem.  The  Bayes  estimator  corresponding  to  the  relative  squared  error  loss 
function  and  a  vague  prior  distribution  is  shown  to  be  stable,  and  to  compare 
favorably  with  the  estimators  introduced  by  ODrin  et  aL  (1981)  and  Carroll  and 
Lombard  (1985). 
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t  INTRODUCTION 

Suppose  x  *(*!, . . .  ,xHy  is  a  set  of  success  counts  from  a  binomial  distribution  with 
unknown  parameters  N  and  6.  The  problem  of  estimating  N  was  first  considered  by  Haldane 
(1942),  who  proposed  the  method  of  moments  estimator,  and  Fisher  (1942),  who  derived  the 
maximum  likelihood  estimator.  DeRiggi  (1983)  showed  that  the  relevant  likelihood  function  is 
unimodal.  However,  Ollrin,  Petkau,  and  Zidek  (1981)  -  hereafter  OPZ  -  showed  that  both  these 
estimators  can  be  unstable  in  the  sense  that  a  small  change  in  the  data  can  cause  a  large  change 
in  the  estimate  of  N. 

OPZ  introduced  modified  estimators  and  showed  that  they  are  stable.  On  the  basis  of  a 
simulation  study,  they  recommended  die  estimator  which  they  called  MME:S.  Casella  (1986) 
suggested  a  move  refined  way  of  deciding  whether  or  not  to  use  a  stabilised  estimator. 
Kappenman  (1983)  introduced  die  "sample  reuse"  estimator,  this  performed  similarly  to  MME:S 
in  a  simulation  study,  and  is  not  further  considered  here.  The  history  and  applications  of  the 
problem  were  discussed  in  more  detail  by  OPZ;  a  recent  application  was  described  by  Dahiya 
(1980),  who  used  the  maximum  likelihood  estimator  to  estimate  the  population  sizes  of  different 
types  of  organism  in  a  plankton  sample. 

Draper  and  Guttman  (1971)  adopted  a  Bayesian  approach,  and  gave  a  full  solution  for  the 
case  where  N  and  8  are  independent  a  priori,  the  prior  distribution  of  N  is  uniform,  and  that  of  0 
is  beta.  Blumenthal  and  Dahiya  (1981)  suggested  N *  as  an  estimator  of  N,  where  (N*,6*)  is  the 
joint  posterior  mode  of  (N,0)  with  the  Draper-Guttman  prior.  However,  they  did  not  say  how 
the  parameters  of  the  beta  prior  for  0  should  be  chosen.  Carroll  and  Lombard  (1985)  -  hereafter 
CL  -  recommended  the  N  estimator  Mbeta  (1,1),  the  posterior  mode  of  N  with  the  Draper- 


Guttman  prior  after  integrating  out  8,  where  the  prior  of  8  has  the  form  p  (8) «  8(1-8)  (O£0£l). 


Most  of  these  papers  were  concerned  almost  exclusively  with  point  estimation;  interval 
estimation  has  been  little  studied  The  simpler  problem  of  estimating  N  when  8  is  known  has 
been  addressed  by  Feldman  and  Fox  (1968),  and  Hunter  and  Griffiths  (1978). 

I  adopt  a  Bayes  empirical  Bayes  approach  (Deely  and  Lindley  1981).  This  provides  a 
simple  way  of  specifying  prior  information,  and  also  allows  a  convenient  representation  of  vague 
prior  knowledge  using  limiting,  improper,  prior  forms.  It  leads  to  solutions  of  both  the  point 
estimation  and  interval  estimation  problems.  The  Bayes  estimator  corresponding  to  die  relative 
squared  error  loss  function  and  a  vague  prior  distribution  is  shown  in  Section  3  to  be  stable,  and, 
using  simulation,  to  compare  favorably  with  both  MME:S  and  Mbeta  (1,1). 

2.  A  BAYES  EMPIRICAL  BAYES  APPROACH 

I  assume  that  N  has  a  Poisson  distribution  with  mean  This  defines  an  empirical  Bayes 
model  in  the  sense  of  Morris  (1983).  Then  xv . . .  ,xH  are  realisations  of  a  Poisson  random 
variable  with  mean  X-^0. 1  carry  out  a  Bayesian  analysis  of  this  model. 

I  specify  the  prior  distribution  in  terms  of  (X,0)  rather  than  (^0).  This  is  because,  if  the 
prior  is  baaed  on  peat  experience,  it  would  seem  easier  to  formulate  prior  information  about  X, 
the  mean  of  the  observations,  than  about  m  the  mean  of  die  unobserved  quantity  N .  If  this  is  so, 
prior  information  about  X  would  be  more  precise  than  that  about  \i  or  8,  so  that  it  may  be  more 
reasonable  to  assume  X  and  8  independent  a  priori  than  ft  and  8.  In  this  case,  n  and  0  would  be 
negatively  associated  a  priori.  Jewell  (1985)  has  proposed  a  solution  to  the  different  but  related 
problem  of  population  size  estimation  from  capture-recapture  sampling,  which  is  based  on  an 


assumption  similar  to  prior  independence  of  p  and  8  in  the  present  context 


The  posterior  distribution  of  N  is 

H  1  M 

pCNI*)«  (N!)-1  (n^)}  1 1 8rw*s(i-ey"-sxwexp(-x/e)/>(X,e)</Xde 

( NZ*mu)  (2.1) 

m 

where  S  =  and  x maw=max{x  j, . . .  ,xm  }.  If  X  and  0  are  independent  a  priori,  and  X  has  a 
i«  l 

gamma  prior  distribution,  so  that  p  (X,8) ~  Xx'~le~**p  (8),  then  X  can  be  integrated  out 
analytically,  and  (2.1)  becomes 

p^IjrJ-dVJ^rdV+Ki)  {ft#} 

i«l  ‘ 

1 

1 r"*5  (l-6)"w's  (r'+K,)^^  (9)  dfl 

I  now  consider  the  case  where  vague  prior  knowledge  about  the  model  parameters  is 
represented  by  limiting,  improper,  prior  forms.  I  use  the  prior p (X,8)~ X"1,  which  is  the  product 
of  the  standard  vague  prior  for  X  (Jaynes  1968)  with  a  uniform  prior  for  0.  This  leads  to  the 
same  solution  as  if  a  similar  vague  prior  were  used  for  (n,0),  namely  p  (p,0)  ~  jx-1.  The  posterior 

is 

p<fl\x)-UnN-S)VW+l)W}  {ri<?»  (2.2) 

<>1  1 

In  the  important  special  case  where  i»*l,  (2.2)  becomes 


-5- 


p(iV|x)  =*!/{#  (N+l)}  (NZxJ 

so  that  the  posterior  median  is  2x  j,  which  seems  intuitively  reasonable. 

3.  POINT  ESTIMATION 

Bayes  estimators  of  N  may  be  obtained  by  combining  (2.2)  with  appropiate  loss  functions; 
examples  are  die  posterior  mode  of  N,  MOD,  and  the  posterior  median  of  N,  MED.  Previous 
authors,  including  OPZ,  CL,  and  Casella  (1986)  have  agreed  that  the  relative  mean  squared  error 
of  an  estimator  N,  equal  to  E[(N/N-l)2],  is  an  appropriate  loss  function  for  this  problem.  The 
Bayes  estimator  corresponding  to  this  loss  function  is 

MRE  =  £  N-lp(N\x)/  £  AT2p(N|x) 

N=x N*i Can 

The  three  Bayes  estimators,  MOD,  MED,  and  MRE,  are  reasonably  stable,  as  can  be  seen 
from  the  results  for  the  eight  particularly  difficult  cases  listed  in  Table  2  of  OPZ,  which  are 
shown  in  Table  1.  MED  was  closer  to  the  true  value  of  N  than  the  other  estimators  considered  in 
four  of  the  eight  cases,  while  MOD  was  best  in  a  further  three  cases.  However,  in  the  cases  in 
which  MOD  was  best,  MED  performed  poorly;  the  converse  was  also  true.  The  other  three 
estimators  always  fell  between  MOD  and  MED. 


The  results  of  a  simulation  study  are  shown  in  Table  2. 1  used  the  same  design  as  OPZ  and 
CL.  In  each  replication,  N,  8,  and  n  were  generated  from  uniform  distributions  on  [0,1], 
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{1, . . . ,  100},  and  {3, . . .  ,22}  respectively,  using  the  uniform  random  number  generator  of 
Marsaglia,  Ananthanarayanan,  and  Paul  (1973).  A  binomial  success  count  was  then  generated 
using  the  IMSL  routine  GGBN.  There  were  2,000  replications. 


Table  2  about  here 


Table  2  shows  that  MRE  performed  somewhat  better  than  MME:S  and  Mbeta  (1,1)  in  both 
stable  and  unstable  cases,  with  an  overall  efficiency  gain  of  about  10%  over  MME:S,  and  about 
6%  over  Mbeta  (1,1).  Here,  as  in  OPZ,  a  sample  is  defined  to  be  stable  if  x/s2£  I+1W2,  and 
unstable  otherwise,  where*  and^a^Xj-x)2//!. 


4.  EXAMPLES 

CL  analyzed  two  examples,  involving  counts  of  impala  hods  and  individual  waterbuck. 
The  point  estimators  are  shown  in  Table  3.  The  stability  of  the  Bayes  estimators  is  again 
apparent;  the  stability  of  MRE  for  the  waterbuck  example  is  noteworthy  given  the  highly 
unstable  nature  of  this  data  set. 


Table  3  about  here 


The  posterior  distributions  obtained  from  (2.2)  are  shown  in  Figures  1  and  2.  The  posterior 


distribution  for  the  waterbuck  example  has  a  very  long  tail;  this  may  be  related  to  the  extreme 
instability  of  this  data  set 
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Figures  I  and  2  about  here 
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Table  1.  N  Estimators  for  Selected  and  Perturbed  Samples. 


Parameters  Estimators 


Sample 

N 

0 

n 

MME:S 

Mbeta  (1,1) 

MOD 

MED 

MRE 

1 

75 

.32 

5 

70 

49 

42 

82 

57 

80 

52 

46 

91 

62 

2 

34 

.57 

4 

77 

47 

42 

84 

57 

91 

52 

46 

95 

62 

3 

37 

.17 

20 

25 

23 

21 

40 

26 

27 

25 

23 

46 

29 

4 

48 

.06 

15 

10 

8 

7 

14 

10 

12 

10 

10 

19 

12 

5 

40 

.17 

12 

26 

25 

23 

42 

30 

32 

29 

27 

52 

35 

6 

74 

.68 

12 

153 

125 

114 

207 

127 

162 

131 

120 

217 

129 

7 

55 

.48 

20 

69 

63 

59 

91 

75 

74 

67 

63 

101 

81 

8 

60 

.24 

15 

49 

41 

38 

68 

49 

53 

45 

41 

77 

53 

NOTE:  The  exact  samples  are  given  in  Table  2  of  OPZ.  For  each  sample  number,  the 
first  entries  are  the  N  estimates  for  the  original  sample,  and  the  second  entries  arc  the  N 
estimates  for  the  perturbed  sample  obtained  by  adding  one  to  the  largest  success  count. 


Table  2.  Relative  Mean  Square  Errors 
of  the  N  Estimators 


Estimators 


Cases 

No. 

MME:S 

Mbeta  (1,1) 

MRE 

All  cases 

2000 

.171 

.165 

.156 

Stable  cases 

1378 

.108 

.104 

.100 

Unstable  cases 

622 

.312 

.300 

.281 

Table  3.  Estimators  for  the  Impala  and  Waterbuck 
Examples:  Original  and  Perturbed  Samples 


Estimators 


Example 

MME:S 

Mbeta  (1,1) 

MOD 

MED 

MRE 

Impala 

54 

42 

37 

67 

49 

63 

46 

40 

76 

54 

Waterbuck 

199 

140 

122 

223 

131 

215 

146 

127 

232 

132 

NOTE:  The  data  are  given  in  Section  4  of  CL.  For  each  example,  the  first  entries  are  the 
N  estimates  for  the  original  sample,  and  the  second  entries  are  the  N  estimates  for  the 
perturbed  sample  obtained  by  adding  one  to  the  largest  success  count 


N 


